library(tidyverse) # for data wrangling etc
library(rstanarm) # for fitting models in STAN
library(cmdstanr) # for cmdstan
library(brms) # for fitting models in STAN
library(coda) # for diagnostics
library(bayesplot) # for diagnostics
library(ggmcmc) # for MCMC diagnostics
library(DHARMa) # for residual diagnostics
library(rstan) # for interfacing with STAN
library(emmeans) # for marginal means etc
library(broom) # for tidying outputs
library(tidybayes) # for more tidying outputs
library(ggeffects) # for partial plots
library(broom.mixed) # for summarising models
library(ggeffects) # for partial effects plots
library(bayestestR) # for ROPE
library(see) # for some plots
library(easystats) # for the easystats ecosystem
library(INLA) # for approximate Bayes
library(INLAutils) # for additional INLA outputs
library(patchwork) # for multiple plots
library(modelsummary) # for data and model summaries
theme_set(theme_grey()) # put the default ggplot theme back
source("helperFunctions.R")Bayesian GLM Part2
1 Preparations
Load the necessary libraries
2 Scenario
Polis et al. (1998) were interested in modelling the presence/absence of lizards (Uta sp.) against the perimeter to area ratio of 19 islands in the Gulf of California.
| ISLAND | RATIO | PA |
|---|---|---|
| Bota | 15.41 | 1 |
| Cabeza | 5.63 | 1 |
| Cerraja | 25.92 | 1 |
| Coronadito | 15.17 | 0 |
| .. | .. | .. |
| ISLAND | Categorical listing of the name of the 19 islands used - variable not used in analysis. |
| RATIO | Ratio of perimeter to area of the island. |
| PA | Presence (1) or absence (0) of Uta lizards on island. |
The aim of the analysis is to investigate the relationship between island perimeter to area ratio and the presence/absence of Uta lizards.
3 Read in the data
Rows: 19 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): ISLAND
dbl (2): RATIO, PA
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spc_tbl_ [19 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ISLAND: chr [1:19] "Bota" "Cabeza" "Cerraja" "Coronadito" ...
$ RATIO : num [1:19] 15.41 5.63 25.92 15.17 13.04 ...
$ PA : num [1:19] 1 1 1 0 1 0 0 0 0 1 ...
- attr(*, "spec")=
.. cols(
.. ISLAND = col_character(),
.. RATIO = col_double(),
.. PA = col_double()
.. )
- attr(*, "problems")=<externalptr>
polis (19 rows and 3 variables, 3 shown)
ID | Name | Type | Missings | Values | N
---+--------+-----------+----------+---------------+-----------
1 | ISLAND | character | 0 (0.0%) | Angeldlg | 1 ( 5.3%)
| | | | Bahiaan | 1 ( 5.3%)
| | | | Bahiaas | 1 ( 5.3%)
| | | | Blanca | 1 ( 5.3%)
| | | | Bota | 1 ( 5.3%)
| | | | Cabeza | 1 ( 5.3%)
| | | | Cerraja | 1 ( 5.3%)
| | | | Coronadito | 1 ( 5.3%)
| | | | Flecha | 1 ( 5.3%)
| | | | Gemelose | 1 ( 5.3%)
| | | | (...) |
---+--------+-----------+----------+---------------+-----------
2 | RATIO | numeric | 0 (0.0%) | [0.21, 63.16] | 19
---+--------+-----------+----------+---------------+-----------
3 | PA | numeric | 0 (0.0%) | 0 | 9 (47.4%)
| | | | 1 | 10 (52.6%)
---------------------------------------------------------------
| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| RATIO | 19 | 0 | 18.7 | 17.5 | 0.2 | 15.2 | 63.2 | |
| PA | 2 | 0 | 0.5 | 0.5 | 0.0 | 1.0 | 1.0 | |
| ISLAND | N | % | ||||||
| Angeldlg | 1 | 5.3 | ||||||
| Bahiaan | 1 | 5.3 | ||||||
| Bahiaas | 1 | 5.3 | ||||||
| Blanca | 1 | 5.3 | ||||||
| Bota | 1 | 5.3 | ||||||
| Cabeza | 1 | 5.3 | ||||||
| Cerraja | 1 | 5.3 | ||||||
| Coronadito | 1 | 5.3 | ||||||
| Flecha | 1 | 5.3 | ||||||
| Gemelose | 1 | 5.3 | ||||||
| Gemelosw | 1 | 5.3 | ||||||
| Jorabado | 1 | 5.3 | ||||||
| Mejia | 1 | 5.3 | ||||||
| Mitlan | 1 | 5.3 | ||||||
| Pata | 1 | 5.3 | ||||||
| Pescador | 1 | 5.3 | ||||||
| Piojo | 1 | 5.3 | ||||||
| Smith | 1 | 5.3 | ||||||
| Ventana | 1 | 5.3 |
4 Exploratory data analysis
The individual responses (\(y_i\), observed presence/absence of Uta lizards) are each expected to have been independently drawn from Bernoulli (or binomial) distributions (\(\mathcal{Bin}\)). These distributions represent all the possible presence/absences we could have obtained at the specific (\(i^th\)) level of island perimeter to area ratio. Hence the \(i^th\) presence/absence observation is expected to have been drawn from a binomial distribution with a probability of \(\mu_i\) and size of (\(n=1\)).
The expected probabilities are related to the linear predictor (intercept plus slope associated with perimeter to area ratio) via a logit link.
We need to supply priors for each of the parameters to be estimated (\(\beta_0\) and \(\beta_1\)). Whilst we want these priors to be sufficiently vague as to not influence the outcomes of the analysis (and thus be equivalent to the frequentist analysis), we do not want the priors to be so vague (wide) that they permit the MCMC sampler to drift off into parameter space that is both illogical as well as numerically awkward.
As a starting point, lets assign the following priors:
- \(\beta_0\): Normal prior centred at 0 with a variance of 2.5
- \(\beta_1\): Normal prior centred at 0 with a variance of 1
Note, when fitting models through either rstanarm or brms, the priors assume that the predictor(s) have been centred and are to be applied on the link scale. In this case the link scale is an identity.
Model formula: \[ \begin{align} y_i &\sim{} \mathcal{Bin}(n, p_i)\\ ln\left(\frac{p_i}{1-p_i}\right) &= \beta_0 + \beta_1 x_i\\ \beta_0 &\sim{} \mathcal{N}(0,2.5)\\ \beta_1 &\sim{} \mathcal{N}(0,1)\\ \end{align} \]
5 Fit the model
Call:
glm(formula = PA ~ RATIO, family = binomial(), data = polis)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.6061 1.6953 2.127 0.0334 *
RATIO -0.2196 0.1005 -2.184 0.0289 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 26.287 on 18 degrees of freedom
Residual deviance: 14.221 on 17 degrees of freedom
AIC: 18.221
Number of Fisher Scoring iterations: 6
Call:
glm(formula = PA ~ center(RATIO), family = binomial(), data = polis)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5081 0.7395 -0.687 0.4920
center(RATIO) -0.2196 0.1005 -2.184 0.0289 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 26.287 on 18 degrees of freedom
Residual deviance: 14.221 on 17 degrees of freedom
AIC: 18.221
Number of Fisher Scoring iterations: 6
In rstanarm, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over-fitting) and help stabilise the computations.
In the above:
- the formula and data arguments should be familiar as they are the same as for many models in R
- iter: indicates the number of MCMC iterations to perform per chain
- warmup: indicates how many of the initial MCMC samples to exclude. During the warmup stage, the MCMC sampler is tuning its sampling operations to allow it to determine the best way to create the chain. While it is in this discovery phase, the estimates it produces might be unrepresentative.
- chains: indicates the number of independent chains to run. The more chains, the more random starts. The point of multiple random starts is to minimise the potential for the starting point to dictate the outcome. It is possible that a chain might get stuck sampling a single feature of the posterior and not discover additional features. Additional random starting points should hopefully alleviate this. Either 3 or 4 is typical.
- thin: indicates the rate at which the MCMC samples should be thinned in order to eliminate auto-correlation between MCMC samples. When the step lengths between MCMC samples are small, the resulting parameter estimates will be auto-correlated (since samples close by in the chain will be more similar than those far apart). The rate of thinning necessary will depend on the degree of auto-correlation. For stan models, a thinning rate of 5 is a good starting point.
- refresh: indicates how often the terminal display should be updated. When running a large model, this can be useful as it provides a form of progress. However, when the routine is within an Rmarkdown document, it just results in the output including each line of progress and a lot of space is taken up unnecessarily
Having allowed rstanarm to formulate its own weakly informative priors, it is a good idea to explore what they are. Firstly, out of curiosity, it might be interesting to see what it has chosen. However, more importantly, we need to be able to document what the priors were and the rstanarm development team make it very clear that there is no guarantee that the default priors will remain the same into the future.
Priors for model 'polis.rstanarm'
------
Intercept (after predictors centered)
~ normal(location = 0, scale = 2.5)
Coefficients
Specified prior:
~ normal(location = 0, scale = 2.5)
Adjusted prior:
~ normal(location = 0, scale = 0.14)
------
See help('prior_summary.stanreg') for more details
This tells us:
- for the intercept: the mean of the response is approximately 0.53. All priors are applied on the link scale (which by default for logistic regression is logit).
rstanarmtakes the view that if the predictors are centred (which they are by default), the mean of the response will be close to 0.5. On the logit scale, this equates to a value of 0. Hence, the prior on intercept is defined as a normal prior with a mean of 0. For logistic regression, the response values must all be either 0 or 1. On the logit scale, values can range from -Inf to Inf. Nevertheless, values on the logit scale that exceed 2.5 in magnitude equate to value on the response scale of either very high or very low (if negative).rstanarmtherefore uses a standard deviation of 2.5 for the normal prior on the intercept.
- for the coefficients (in this case, just the slope), the default prior is a normal prior centred around 0 with a standard deviation of 2.5 times the reciprocal of the standard deviation of the predictor (so 0.14 in this case).
- there are no auxiliary parameters and thus there are no auxiliary priors
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
We can draw from the prior predictive distribution instead of conditioning on the response, by updating the model and indicating prior_PD=TRUE. After refitting the model in this way, we can plot the predictions to gain insights into the range of predictions supported by the priors alone.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Conclusions:
- we see that the range of predictions is very wide and the slope could range from strongly negative to strongly positive.
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
- \(\beta_0\): normal centred at 0 with a standard deviation of 2.5
- mean: 0 - because the mean response when predictors are centred should be approx. 0.5 (which is 0 on logit scale)
- sd: 2.5 - since this represents wide priors on logit scale
- \(\beta_1\): normal centred at 0 with a standard deviation of 0.1
- sd: 0.1 - since (
2.5 * sd(polis$PA)/sd(polis$RATIO))
- sd: 0.1 - since (
I will also overlay the raw data for comparison.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Now lets refit, conditioning on the data.
posterior_vs_prior(polis.rstanarm3,
color_by = "vs", group_by = TRUE,
facet_args = list(scales = "free_y")
)
Drawing from prior...
Conclusions:
- in each case, the prior is substantially wider than the posterior, suggesting that the posterior is not biased towards the prior.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
ggemmeans(polis.rstanarm3, terms = "RATIO[0:63]") |>
plot(
show_data = TRUE,
show_residuals = TRUE,
jitter = FALSE
)Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
In brms, the default priors are designed to be weakly informative. They are chosen to provide moderate regularisation (to help prevent over fitting) and help stabilise the computations.
Unlike rstanarm, brms models must be compiled before they start sampling. For most models, the compilation of the stan code takes around 45 seconds.
form <- bf(PA | trials(1) ~ RATIO, family = binomial())
polis.brm <- brm(form,
data = polis,
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
refresh = 0,
backend = "rstan"
)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
In the above:
- the formula and data arguments should be familiar as they are the same as for many models in R. That said, the formula specification of
brmscan define models that are not possible by most other routines. To facilitate this enhanced functionality, we usually define abrmsformula within its ownbf()function along with thefamily(in this case, it is Gaussian, which is the default and therefore can be omitted.) - iter: indicates the number of MCMC iterations to perform per chain
- warmup: indicates how many of the initial MCMC samples to exclude. During the warmup stage, the MCMC sampler is tuning its sampling operations to allow it to determine the best way to create the chain. While it is in this discovery phase, the estimates it produces might be unrepresentative.
- chains: indicates the number of independent chains to run. The more chains, the more random starts. The point of multiple random starts is to minimise the potential for the starting point to dictate the outcome. It is possible that a chain might get stuck sampling a single feature of the posterior and not discover additional features. Additional random starting points should hopefully alleviate this. Either 3 or 4 is typical.
- thin: indicates the rate at which the MCMC samples should be thinned in order to eliminate auto-correlation between MCMC samples. When the step lengths between MCMC samples are small, the resulting parameter estimates will be auto-correlated (since samples close by in the chain will be more similar than those far apart). The rate of thinning necessary will depend on the degree of auto-correlation. For stan models, a thinning rate of 5 is a good starting point.
- refresh: indicates how often the terminal display should be updated. When running a large model, this can be useful as it provides a form of progress. However, when the routine is within an Rmarkdown document, it just results in the output including each line of progress and a lot of space is taken up unnecessarily
Having allowed brms to formulate its own weakly informative priors, it is a good idea to explore what they are. Firstly, out of curiosity, it might be interesting to see what it has chosen. However, more importantly, we need to be able to document what the priors were and the brms development team make it very clear that there is no guarantee that the default priors will remain the same into the future.
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b RATIO (vectorized)
student_t(3, 0, 2.5) Intercept default
This tells us:
for the intercept, it is using a student t (flatter normal) prior with a mean of 0 and a standard deviation of 2.5. These are the defaults used for a binomial model. The mean of the response is approximately 0.53. All priors are applied on the link scale (which by default for logistic regression is logit).
brmstakes the view that if the predictors are centred (which they are by default), the mean of the response will be close to 0.5. On the logit scale, this equates to a value of 0. Hence, the prior on intercept is defined as a normal prior with a mean of 0. For logistic regression, the response values must all be either 0 or 1. On the logit scale, values can range from -Inf to Inf. Nevertheless, values on the logit scale that exceed 2.5 in magnitude equate to value on the response scale of either very high or very low (if negative).brmstherefore uses a standard deviation of 2.5 for the normal prior on the intercept.for the beta coefficients (in this case, just the slope), the default prior is a improper flat prior. A flat prior essentially means that any value between negative infinity and positive infinity are equally likely. Whilst this might seem reckless, in practice, it seems to work reasonably well for non-intercept beta parameters.
there are no distributional parameters in a binomial model and therefore there are no additional priors.
One way to assess the priors is to have the MCMC sampler sample purely from the prior predictive distribution without conditioning on the observed data. Doing so provides a glimpse at the range of predictions possible under the priors. On the one hand, wide ranging predictions would ensure that the priors are unlikely to influence the actual predictions once they are conditioned on the data. On the other hand, if they are too wide, the sampler is being permitted to traverse into regions of parameter space that are not logically possible in the context of the actual underlying ecological context. Not only could this mean that illogical parameter estimates are possible, when the sampler is traversing regions of parameter space that are not supported by the actual data, the sampler can become unstable and have difficulty.
In brms, we can inform the sampler to draw from the prior predictive distribution instead of conditioning on the response, by running the model with the sample_prior='only' argument. Unfortunately, this cannot be applied when there are flat priors (since the posteriors will necessarily extend to negative and positive infinity). Therefore, in order to use this useful routine, we need to make sure that we have defined a proper prior for all parameters.
In this case, we will define alternative priors for the slope. Default priors for the intercept are already provided (should we wish to use them).
form <- bf(PA | trials(1) ~ RATIO, family = binomial())
priors <- prior(normal(0, 2.5), class = "Intercept") +
prior(normal(0, 1), class = "b")
polis.brm1 <- brm(form,
data = polis,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
refresh = 0,
backend = "rstan"
)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
The following are unlikely to be that useful as the predictions must be bound to the range of [0,1]. Hence, it will be difficult to assess whether the priors could result in very large or small predictions.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Conclusions:
- we see that the range of predictions is fairly wide and the slope could range from strongly negative to strongly positive.
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
- \(\beta_0\): normal centred at 0 with a standard deviation of 2.5
- mean: 0 - because the mean response when predictors are centred should be approx. 0.5 (which is 0 on logit scale) - 0
- sd: 2.5 - since this represents wide priors on logit scale
- \(\beta_1\): normal centred at 0 with a standard deviation of 0.1
- sd: 0.1 - since (
2.5 * sd(polis$PA)/sd(polis$RATIO))
- sd: 0.1 - since (
I will also overlay the raw data for comparison.
Although this does allow us to visualise what the this looks like on the link (logit) scale, it is hard to appreciate how this relates to the response (real values) scale. To explore this, we must simulate sampling from a normal distribution, transform the normal and plot that density distribution
dat <- data.frame(sigma = c(2.5, 1.5, 1))
dat <- dat |>
group_by(sigma) |>
reframe(
r = rnorm(10000, 0, sigma),
p = plogis(r)
)
ggplot(dat, aes(x = p)) +
geom_density(aes(fill = factor(sigma)), alpha = 0.3)Hence, it appears that a prior with a standard deviation of 2.5 would indicate that we believe that probabilities close to either 0 or 1 are the most likely. Given that priors are applied to the centered predictors (which means that the intercept is supposed to represent the midpoint - and thus presumably close to the switch point), this would seem inappropriate. Surely we should expect a probability of close to 0.5 at this intercept?
If this logic holds, then a prior standard deviation of 1.5 or even 1 would seem more sensible.
form <- bf(PA | trials(1) ~ RATIO, family = binomial())
priors <- prior(normal(0, 1), class = "Intercept") +
prior(normal(0, 0.1), class = "b")
polis.brm2 <- brm(form,
data = polis,
prior = priors,
sample_prior = "only",
iter = 5000,
warmup = 1000,
chains = 3, cores = 3,
thin = 5,
refresh = 0,
backend = "rstan"
)Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
The desired updates require recompiling the model
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
Compiling Stan program...
Start sampling
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Data points may overlap. Use the `jitter` argument to add some amount of
random variation to the location of data points and avoid overplotting.
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
$N
[1] 19
$Y
[1] 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1
$trials
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
$K
[1] 2
$Kc
[1] 1
$X
Intercept RATIO
1 1 15.41
2 1 5.63
3 1 25.92
4 1 15.17
5 1 13.04
6 1 18.85
7 1 30.95
8 1 22.87
9 1 12.01
10 1 11.60
11 1 6.09
12 1 2.28
13 1 4.05
14 1 59.94
15 1 63.16
16 1 22.76
17 1 23.54
18 1 0.21
19 1 2.55
attr(,"assign")
[1] 0 1
$prior_only
[1] 0
attr(,"class")
[1] "standata" "list"
// generated with brms 2.22.0
functions {
}
data {
int<lower=1> N; // total number of observations
array[N] int Y; // response variable
array[N] int trials; // number of trials
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
}
transformed parameters {
real lprior = 0; // prior contributions to the log posterior
lprior += normal_lpdf(b | 0, 0.1);
lprior += normal_lpdf(Intercept | 0, 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept + Xc * b;
target += binomial_logit_lpmf(Y | trials, mu);
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// additionally sample draws from priors
real prior_b = normal_rng(0,0.1);
real prior_Intercept = normal_rng(0,1);
}
In INLA, the default priors are designed to be diffuse or weak. They are chosen to provide moderate regularisation (to help prevent over-fitting) and help stabilise the computations.
In the above:
the
formula,dataandfamilyarguments should be familiar as they are the same as for other models in R.control.compute: allows us to indicate what additional actions should be performed during the model fitting. In this case, we have indicated:dic: Deviance information criterionwaic: Wantanabe information creterioncpo: out-of-sample estimates (measures of fit)config: return the full configuration - to allow drawing from the posterior.
[1] "names.fixed" "summary.fixed"
[3] "marginals.fixed" "summary.lincomb"
[5] "marginals.lincomb" "size.lincomb"
[7] "summary.lincomb.derived" "marginals.lincomb.derived"
[9] "size.lincomb.derived" "mlik"
[11] "cpo" "gcpo"
[13] "po" "waic"
[15] "residuals" "model.random"
[17] "summary.random" "marginals.random"
[19] "size.random" "summary.linear.predictor"
[21] "marginals.linear.predictor" "summary.fitted.values"
[23] "marginals.fitted.values" "size.linear.predictor"
[25] "summary.hyperpar" "marginals.hyperpar"
[27] "internal.summary.hyperpar" "internal.marginals.hyperpar"
[29] "offset.linear.predictor" "model.spde2.blc"
[31] "summary.spde2.blc" "marginals.spde2.blc"
[33] "size.spde2.blc" "model.spde3.blc"
[35] "summary.spde3.blc" "marginals.spde3.blc"
[37] "size.spde3.blc" "logfile"
[39] "misc" "dic"
[41] "mode" "joint.hyper"
[43] "nhyper" "version"
[45] "Q" "graph"
[47] "ok" "cpu.intern"
[49] "cpu.used" "all.hyper"
[51] ".args" "call"
[53] "model.matrix"
Having allowed INLA to formulate its own “minimally informative” priors, it is a good idea to explore what they are. Firstly, out of curiosity, it might be interesting to see what it has chosen. However, more importantly, we need to be able to document what the priors were and the INLA development team make it very clear that there is no guarantee that the default priors will remain the same into the future.
In calcutating the posterior mode of hyperparameters, it is efficient to maximising the sum of the (log)-likelihood and the (log)-prior, hence, priors are defined on a log-scale. The canonical prior for variance is the gamma prior, hence in INLA, this is a loggamma.
They are also defined according to their mean and precision (inverse-scale, rather than variance). Precision is \(1/\sigma\).
To explore the default priors used by INLA, we can issue the following on the fitted model:
section=[family]
tag=[INLA.Data1] component=[gaussian]
theta1:
parameter=[log precision]
prior=[loggamma]
param=[1e+00, 5e-05]
section=[fixed]
tag=[(Intercept)] component=[(Intercept)]
beta:
parameter=[(Intercept)]
prior=[normal]
param=[0, 0]
tag=[RATIO] component=[RATIO]
beta:
parameter=[RATIO]
prior=[normal]
param=[0.000, 0.001]
The above indicates:
- the prior on the overal family (log) precision (\(1/sigma\)) is a log-gamma with shape of 1 and scale of 0.00005. Since this is a log-gamma on log-precision, it is equivalent to a gamma (shape = \(1\), scale = \(10^-5\)) on precision. Such a distribution is very wide and has a mean of (\(shape/scale =\) 2^{4}) and variance of (\(shape/scale^2 =\) 4^{8}).
- the prior on the intercept is a Gaussian (normal) distribution with mean of 0 and precision of 0 - this is effectively a uniform distribution from \(-\infty\) to \(\infty\).
- the prior on the slope is a Gaussian (normal) distribution with a mean of 0 and a precision of 0.001 (which equates to a standard deviation of \((1/0.001)^{0.5}\)=31.6227766. This is also a very, very wide distribution (consider what a slope large in magnitude of 100 means in the context of the rate of probability of being present increase per unit increase in perimeter to area ratio and on a log odds scale).
Family variance
No existent
Intercept
The default prior on the intercept is a Gaussian with mean of 0 and precision of 0 (and thus effectively a flat uniform). Alternatively, we could define priors on the intercept that are more realistic. For example, we know that the middle probability of Uta lizard presence is likely to be close to 0.5 0.5263158. However, the parameters are all on a logit scale. Hence a sensible prior would be log(0.5/(1-0.5)) = 0 (if we were to centre RATIO).
We also know that the probability cannot extend above 1 and below 0. That is plus or minus 0.5. On the logit scale, this would be equivalent to approximately plus or minus 1.
We could use these values as the basis for weekly informative priors on the intercept. Note, as INLA priors are expressed in terms of precision rather than variance, an equvilent prior would be \(\sim{}~\mathcal{N}(0, 1)\) (e.g. \(1/(1)^2\)).
Fixed effects
The priors for the fixed effects (slope) is a Gaussian (normal) distributions with mean of 0 and precision (0.001). This implies that the prior for slope has a standard deviation of approximately 31 (since \(\sigma = \frac{1}{\sqrt{\tau}}\)). As a general rule, three standard deviations envelopes most of a distribution, and thus this prior defines a distribution whose density is almost entirely within the range [-93,93]. On a logit scale, this is very large.
In order to generate realistic informative Gaussian priors (for the purpose of constraining the posterior to a logical range) for fixed parameters, the following formulae are useful:
\[ \begin{align} \mu &= \frac{z_2\theta_1 - z_1\theta_2}{z_2-z_1}\\ \sigma &= \frac{\theta_2 - \theta_1}{z_2-z_1} \end{align} \]
where \(\theta_1\) and \(\theta_2\) are the quantiles on the response scale and \(z_1\) and \(z_2\) are the corresponding quantiles on the standard normal scale. Hence, if we considered that the slope is likely to be in the range of [-2,2], (which would correspond to a range of fractional rate changes per one unit change in RATIO of between -100% and 100%), we could specify a Normal prior with mean of \(\mu=\frac{(qnorm(0.5,0,1)*0) - (qnorm(0.975,0,1)*10)}{10-0} = 0\) and a standard deviation of \(\sigma^2=\frac{10 - 0}{qnorm(0.975,0,1)-qnorm(0.5,0,1)} = 5.102\). In INLA (which defines priors in terms of precision rather than standard deviation), the associated prior would be \(\beta \sim{} \mathcal{N}(0, 0.0384)\).
In order to define each of the above priors, we could modify the inla call:
6 MCMC sampling diagnostics
The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.
See list of available diagnostics by name
bayesplot MCMC module:
mcmc_acf
mcmc_acf_bar
mcmc_areas
mcmc_areas_ridges
mcmc_combo
mcmc_dens
mcmc_dens_chains
mcmc_dens_overlay
mcmc_hex
mcmc_hist
mcmc_hist_by_chain
mcmc_intervals
mcmc_neff
mcmc_neff_hist
mcmc_nuts_acceptance
mcmc_nuts_divergence
mcmc_nuts_energy
mcmc_nuts_stepsize
mcmc_nuts_treedepth
mcmc_pairs
mcmc_parcoord
mcmc_rank_ecdf
mcmc_rank_hist
mcmc_rank_overlay
mcmc_recover_hist
mcmc_recover_intervals
mcmc_recover_scatter
mcmc_rhat
mcmc_rhat_hist
mcmc_scatter
mcmc_trace
mcmc_trace_highlight
mcmc_violin
Of these, we will focus on:
- mcmc_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain
There is no evidence of auto-correlation in the MCMC samples
- Rhat: Rhat is a measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
All Rhat values are below 1.05, suggesting the chains have converged.
neff (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Ratios all very high.
The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.
Of these, we will focus on:
- stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- stan_acf (auto-correlation function): plots the auto-correlation between successive MCMC sample lags for each parameter and each chain
There is no evidence of auto-correlation in the MCMC samples
- stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Ratios all very high.
The ggmean package also has a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
- ggs_traceplot: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- gss_autocorrelation (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain
There is no evidence of autocorrelation in the MCMC samples
- stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
ℹ The deprecated feature was likely used in the ggmcmc package.
Please report the issue at <https://github.com/xfim/ggmcmc/issues/>.
Ratios all very high.
The bayesplot package offers a range of MCMC diagnostics as well as Posterior Probability Checks (PPC), all of which have a convenient plot() interface. Lets start with the MCMC diagnostics.
See list of available diagnostics by name
bayesplot MCMC module:
mcmc_acf
mcmc_acf_bar
mcmc_areas
mcmc_areas_ridges
mcmc_combo
mcmc_dens
mcmc_dens_chains
mcmc_dens_overlay
mcmc_hex
mcmc_hist
mcmc_hist_by_chain
mcmc_intervals
mcmc_neff
mcmc_neff_hist
mcmc_nuts_acceptance
mcmc_nuts_divergence
mcmc_nuts_energy
mcmc_nuts_stepsize
mcmc_nuts_treedepth
mcmc_pairs
mcmc_parcoord
mcmc_rank_ecdf
mcmc_rank_hist
mcmc_rank_overlay
mcmc_recover_hist
mcmc_recover_intervals
mcmc_recover_scatter
mcmc_rhat
mcmc_rhat_hist
mcmc_scatter
mcmc_trace
mcmc_trace_highlight
mcmc_violin
Of these, we will focus on:
- trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different shade of blue, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- acf_bar (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain
There is no evidence of autocorrelation in the MCMC samples
- rhat_hist: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
All Rhat values are below 1.05, suggesting the chains have converged.
neff_hist (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Ratios all very high.
The rstan package offers a range of MCMC diagnostics. Lets start with the MCMC diagnostics.
Of these, we will focus on:
- stan_trace: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- stan_acf (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain
There is no evidence of autocorrelation in the MCMC samples
- stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Ratios all very high.
The ggmean package also has a set of MCMC diagnostic functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
- ggs_traceplot: this plots the estimates of each parameter over the post-warmup length of each MCMC chain. Each chain is plotted in a different colour, with each parameter in its own facet. Ideally, each trace should just look like noise without any discernible drift and each of the traces for a specific parameter should look the same (i.e, should not be displaced above or below any other trace for that parameter).
The chains appear well mixed and very similar
- gss_autocorrelation (autocorrelation function): plots the autocorrelation between successive MCMC sample lags for each parameter and each chain
There is no evidence of autocorrelation in the MCMC samples
- stan_rhat: Rhat is a scale reduction factor measure of convergence between the chains. The closer the values are to 1, the more the chains have converged. Values greater than 1.05 indicate a lack of convergence. There will be an Rhat value for each parameter estimated.
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
Ratios all very high.
7 Model validation
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
See list of available diagnostics by name
bayesplot PPC module:
ppc_bars
ppc_bars_grouped
ppc_boxplot
ppc_dens
ppc_dens_overlay
ppc_dens_overlay_grouped
ppc_ecdf_overlay
ppc_ecdf_overlay_grouped
ppc_error_binned
ppc_error_hist
ppc_error_hist_grouped
ppc_error_scatter
ppc_error_scatter_avg
ppc_error_scatter_avg_grouped
ppc_error_scatter_avg_vs_x
ppc_freqpoly
ppc_freqpoly_grouped
ppc_hist
ppc_intervals
ppc_intervals_grouped
ppc_km_overlay
ppc_km_overlay_grouped
ppc_loo_intervals
ppc_loo_pit_ecdf
ppc_loo_pit_overlay
ppc_loo_pit_qq
ppc_loo_ribbon
ppc_pit_ecdf
ppc_pit_ecdf_grouped
ppc_ribbon
ppc_ribbon_grouped
ppc_rootogram
ppc_scatter
ppc_scatter_avg
ppc_scatter_avg_grouped
ppc_stat
ppc_stat_2d
ppc_stat_freqpoly
ppc_stat_freqpoly_grouped
ppc_stat_grouped
ppc_violin_grouped
- dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
The model draws appear to be consistent with the observed data.
- error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. That said, these are not overly interpretable for models involving binary data.
This is not interpretable.
- error_scatter_avg_vs_x: this is similar to a regular residual plot and as such should be interpreted as such. Again, this is not interpretable for binary data.
- intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
The modelled predictions are mostly consistent with the observed data. There is really only one exception.
- ribbon: this is just an alternative way of expressing the above plot.
The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
preds <- posterior_predict(polis.rstanarm3, ndraws = 250, summary = FALSE)
polis.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = polis$PA,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(polis.resids)Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.25. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.5. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.75. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Conclusions:
- the simulated residuals do not suggest any issues with the residuals
- there is no evidence of a lack of fit.
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
See list of available diagnostics by name
bayesplot PPC module:
ppc_bars
ppc_bars_grouped
ppc_boxplot
ppc_dens
ppc_dens_overlay
ppc_dens_overlay_grouped
ppc_ecdf_overlay
ppc_ecdf_overlay_grouped
ppc_error_binned
ppc_error_hist
ppc_error_hist_grouped
ppc_error_scatter
ppc_error_scatter_avg
ppc_error_scatter_avg_grouped
ppc_error_scatter_avg_vs_x
ppc_freqpoly
ppc_freqpoly_grouped
ppc_hist
ppc_intervals
ppc_intervals_grouped
ppc_km_overlay
ppc_km_overlay_grouped
ppc_loo_intervals
ppc_loo_pit_ecdf
ppc_loo_pit_overlay
ppc_loo_pit_qq
ppc_loo_ribbon
ppc_pit_ecdf
ppc_pit_ecdf_grouped
ppc_ribbon
ppc_ribbon_grouped
ppc_rootogram
ppc_scatter
ppc_scatter_avg
ppc_scatter_avg_grouped
ppc_stat
ppc_stat_2d
ppc_stat_freqpoly
ppc_stat_freqpoly_grouped
ppc_stat_grouped
ppc_violin_grouped
- dens_overlay: plots the density distribution of the observed data (black line) overlayed on top of 50 density distributions generated from draws from the model (light blue). Ideally, the 50 realisations should be roughly consistent with the observed data.
The model draws appear to be consistent with the observed data.
- error_scatter_avg: this plots the observed values against the average residuals. Similar to a residual plot, we do not want to see any patterns in this plot. Note, this is not really that useful for models that involve a binomial response
Using all posterior draws for ppc type 'error_scatter_avg' by default.
This is not really interpretable
- error_scatter_avg_vs_x: this is similar to a regular residual plot and as such should be interpreted as such. This is not easy to interpret for binomial responses.
Using all posterior draws for ppc type 'error_scatter_avg_vs_x' by default.
- intervals: plots the observed data overlayed on top of posterior predictions associated with each level of the predictor. Ideally, the observed data should all fall within the predictive intervals.
Using all posterior draws for ppc type 'intervals' by default.
- ribbon: this is just an alternative way of expressing the above plot.
Using all posterior draws for ppc type 'ribbon' by default.
The shinystan package allows the full suite of MCMC diagnostics and posterior predictive checks to be accessed via a web interface.
DHARMa residuals provide very useful diagnostics. Unfortunately, we cannot directly use the simulateResiduals() function to generate the simulated residuals. However, if we are willing to calculate some of the components yourself, we can still obtain the simulated residuals from the fitted stan model.
We need to supply:
- simulated (predicted) responses associated with each observation.
- observed values
- fitted (predicted) responses (averaged) associated with each observation
preds <- polis.brm3 |> posterior_predict(ndraws = 250, summary = FALSE)
polis.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = polis$PA,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
polis.resids |> plot()Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.25. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.5. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Warning in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots): basis dimension, k, increased to minimum possible
DHARMa: qgam was unable to calculate quantile regression for quantile 0.75. Possibly to few (unique) data points / predictions. The quantile will be ommited in plots and significance calculations.
Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
wrap_elements(~ testUniformity(polis.resids)) +
wrap_elements(~ plotResiduals(polis.resids, form = factor(rep(1, nrow(polis))))) +
wrap_elements(~ plotResiduals(polis.resids, quantreg = FALSE)) +
wrap_elements(~ testDispersion(polis.resids))Conclusions:
- the simulated residuals do not suggest any issues with the residuals
- there is no evidence of a lack of fit.
8 Partial effects plots
polis.brm3 |>
epred_draws(newdata = polis) |>
median_hdci() |>
ggplot(aes(x = RATIO, y = .epred)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "blue", alpha = 0.3) +
geom_line()partial.obs <- polis |>
mutate(
fit = fitted(polis.brm3, newdata = polis)[, "Estimate"],
resid = resid(polis.brm3)[, "Estimate"],
Obs = fit + resid
)
polis.brm3 |>
epred_draws(newdata = polis) |>
median_hdci() |>
ggplot(aes(x = RATIO, y = .epred)) +
geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "blue", alpha = 0.3) +
geom_point(data = partial.obs, aes(y = Obs)) +
geom_line()9 Model investigation
The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).
Model Info:
function: stan_glm
family: binomial [logit]
formula: PA ~ RATIO
algorithm: sampling
sample: 2400 (posterior sample size)
priors: see help('prior_summary')
observations: 19
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 2.5 1.1 1.2 2.4 3.9
RATIO -0.1 0.1 -0.2 -0.1 -0.1
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.5 0.1 0.4 0.5 0.7
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 2623
RATIO 0.0 1.0 2540
mean_PPD 0.0 1.0 2159
log-posterior 0.0 1.0 2163
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Conclusions:
- in the Model Info, we are informed that the total MCMC posterior sample size is 2400 and that there were 19 raw observations.
- the estimated intercept (expected presence/absence when perimeter to area ratio is 0) is 2.51. This is the mean of the posterior distribution for this parameter. If we back-transform this to the odds-ratio scale, this becomes 12.27. When the perimeter to area ratio is 0, Uta lizards are 12.27 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.15 (mean) or -0.22 (median) with a standard deviation of 0. The 90% credibility intervals indicate that we are 90% confident that the slope is between -0.15 and -0.14 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.86
- Rhat and number of effective samples for each parameter are also provided as MCMC diagnostics and all look good.
polis.rstanarm3$stanfit |>
summarise_draws(
median,
HDInterval::hdi,
rhat, length, ess_bulk, ess_tail
)# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.45 0.493 4.59 1.00 2400 2626. 2325.
2 RATIO -0.144 -0.261 -0.0427 1.000 2400 2585. 2122.
3 mean_PPD 0.526 0.316 0.737 1.00 2400 2153. 2294.
4 log-posterior -11.9 -14.3 -11.2 1.000 2400 2175. 2045.
We can also alter the CI level.
polis.rstanarm3$stanfit |>
summarise_draws(
median,
~ HDInterval::hdi(.x, credMass = 0.9),
rhat, length, ess_bulk, ess_tail
)# A tibble: 4 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.45 0.960 4.36 1.00 2400 2626. 2325.
2 RATIO -0.144 -0.240 -0.0560 1.000 2400 2585. 2122.
3 mean_PPD 0.526 0.368 0.737 1.00 2400 2153. 2294.
4 log-posterior -11.9 -13.5 -11.2 1.000 2400 2175. 2045.
tidyMCMC(polis.rstanarm3$stanfit,
estimate.method = "median", conf.int = TRUE,
conf.method = "HPDinterval", rhat = TRUE, ess = TRUE
)# A tibble: 4 × 7
term estimate std.error conf.low conf.high rhat ess
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 (Intercept) 2.51 1.06 0.493 4.59 0.999 2623
2 RATIO -0.146 0.0565 -0.261 -0.0427 0.999 2540
3 mean_PPD 0.531 0.125 0.316 0.737 1.00 2159
4 log-posterior -12.2 1.02 -14.3 -11.2 1.000 2163
Conclusions:
- the estimated intercept (expected presence/absence when perimeter to area ratio is 0) is 2.51. This is the median of the posterior distribution for this parameter. If we back-transform this to the odds-ratio scale, this becomes 12.27. When the perimeter to area ratio is 0, Uta lizards are 12.27 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.15 (median) with a standard error of 0.06. The 95% credibility intervals indicate that we are 95% confident that the slope is between -0.26 and -0.04 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.86
- Rhat and number of effective samples for each parameter are also provided and all look good.
# A draws_df: 800 iterations, 3 chains, and 4 variables
(Intercept) RATIO mean_PPD log-posterior
1 1.8 -0.11 0.68 -11
2 2.4 -0.14 0.58 -11
3 2.6 -0.18 0.47 -12
4 4.5 -0.24 0.63 -13
5 1.7 -0.12 0.26 -12
6 4.3 -0.21 0.84 -13
7 2.3 -0.11 0.63 -11
8 1.4 -0.12 0.47 -12
9 2.8 -0.16 0.68 -11
10 2.9 -0.16 0.47 -11
# ... with 2390 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
## summarised
polis.rstanarm3$stanfit |>
as_draws_df() |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)# A tibble: 4 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.45 0.493 4.59 1.00 2626.
2 RATIO -0.144 -0.261 -0.0427 1.000 2585.
3 mean_PPD 0.526 0.316 0.737 1.00 2153.
4 log-posterior -11.9 -14.3 -11.2 1.000 2175.
## summarised on fractional scale
polis.rstanarm3$stanfit |>
as_draws_df() |>
mutate(across(everything(), exp)) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)# A tibble: 4 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 11.6 0.558 76.0 1.000 2626.
2 RATIO 0.866 0.770 0.958 1.00 2585.
3 mean_PPD 1.69 1.37 2.09 1.00 2153.
4 log-posterior 0.00000666 0.00000000690 0.0000125 1.000 2175.
polis.draw <- polis.rstanarm3 |> gather_draws(`(Intercept)`, RATIO)
## OR via regex
polis.draw <- polis.rstanarm3 |> gather_draws(`.Intercept.*|RATIO.*`, regex = TRUE)
polis.draw# A tibble: 4,800 × 5
# Groups: .variable [2]
.chain .iteration .draw .variable .value
<int> <int> <int> <chr> <dbl>
1 1 1 1 (Intercept) 1.83
2 1 2 2 (Intercept) 2.38
3 1 3 3 (Intercept) 2.61
4 1 4 4 (Intercept) 4.46
5 1 5 5 (Intercept) 1.74
6 1 6 6 (Intercept) 4.31
7 1 7 7 (Intercept) 2.27
8 1 8 8 (Intercept) 1.41
9 1 9 9 (Intercept) 2.82
10 1 10 10 (Intercept) 2.94
# ℹ 4,790 more rows
We can then summarise this
# A tibble: 2 × 7
.variable .value .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 (Intercept) 2.45 0.493 4.59 0.95 median hdci
2 RATIO -0.144 -0.261 -0.0427 0.95 median hdci
polis.rstanarm3 |>
gather_draws(`(Intercept)`, RATIO) |>
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")We could alternatively express the parameters on the odds (odds ratio) scale.
polis.rstanarm3 |>
gather_draws(`(Intercept)`, RATIO) |>
group_by(.variable) |>
mutate(.value = exp(.value)) |>
median_hdci()# A tibble: 2 × 7
.variable .value .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 (Intercept) 11.6 0.558 75.4 0.95 median hdci
2 RATIO 0.866 0.770 0.958 0.95 median hdci
Conclusions:
- the estimated intercept (expected presence/absence of Uta lizards when the perimeter to area ratio of an island is 0) is 2.45. This is the median of the posterior distribution for this parameter. When the perimeter to area ratio is 0, Uta lizards are 11.56 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.14 (mean) or (median). The 95% credibility intervals indicate that we are 95% confident that the slope is between -0.26 and -0.04 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.87
This is purely a graphical depiction on the posteriors.
# A tibble: 2,400 × 11
.chain .iteration .draw `(Intercept)` RATIO accept_stat__ stepsize__
<int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1.83 -0.114 0.995 0.635
2 1 2 2 2.38 -0.140 1 0.635
3 1 3 3 2.61 -0.175 1 0.635
4 1 4 4 4.46 -0.238 0.803 0.635
5 1 5 5 1.74 -0.124 0.975 0.635
6 1 6 6 4.31 -0.214 0.835 0.635
7 1 7 7 2.27 -0.110 0.975 0.635
8 1 8 8 1.41 -0.124 0.932 0.635
9 1 9 9 2.82 -0.155 1 0.635
10 1 10 10 2.94 -0.158 0.969 0.635
# ℹ 2,390 more rows
# ℹ 4 more variables: treedepth__ <dbl>, n_leapfrog__ <dbl>, divergent__ <dbl>,
# energy__ <dbl>
# A tibble: 2,400 × 5
.chain .iteration .draw `(Intercept)` RATIO
<int> <int> <int> <dbl> <dbl>
1 1 1 1 1.83 -0.114
2 1 2 2 2.38 -0.140
3 1 3 3 2.61 -0.175
4 1 4 4 4.46 -0.238
5 1 5 5 1.74 -0.124
6 1 6 6 4.31 -0.214
7 1 7 7 2.27 -0.110
8 1 8 8 1.41 -0.124
9 1 9 9 2.82 -0.155
10 1 10 10 2.94 -0.158
# ℹ 2,390 more rows
# A tibble: 2,400 × 5
.chain .iteration .draw `(Intercept)` RATIO
<int> <int> <int> <dbl> <dbl>
1 1 1 1 1.83 -0.114
2 1 2 2 2.38 -0.140
3 1 3 3 2.61 -0.175
4 1 4 4 4.46 -0.238
5 1 5 5 1.74 -0.124
6 1 6 6 4.31 -0.214
7 1 7 7 2.27 -0.110
8 1 8 8 1.41 -0.124
9 1 9 9 2.82 -0.155
10 1 10 10 2.94 -0.158
# ℹ 2,390 more rows
## summarised
polis.rstanarm3 |>
spread_draws(`(Intercept)`, RATIO) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)# A tibble: 2 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.45 0.493 4.59 1.00 2626.
2 RATIO -0.144 -0.261 -0.0427 1.000 2584.
## summarised on fractional scale
polis.rstanarm3 |>
spread_draws(`(Intercept)`, RATIO) |>
mutate(across(everything(), exp)) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)# A tibble: 2 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 11.6 0.558 76.0 1.000 2626.
2 RATIO 0.866 0.770 0.958 1.00 2584.
Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.
# A tibble: 2,400 × 2
`(Intercept)` RATIO
<dbl> <dbl>
1 1.83 -0.114
2 2.38 -0.140
3 2.61 -0.175
4 4.46 -0.238
5 1.74 -0.124
6 4.31 -0.214
7 2.27 -0.110
8 1.41 -0.124
9 2.82 -0.155
10 2.94 -0.158
# ℹ 2,390 more rows
The summary() method generates simple summaries (mean, standard deviation as well as 10, 50 and 90 percentiles).
Family: binomial
Links: mu = logit
Formula: PA | trials(1) ~ RATIO
Data: polis (Number of observations: 19)
Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
total post-warmup draws = 2400
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.50 1.02 0.65 4.61 1.00 2167 2251
RATIO -0.14 0.06 -0.25 -0.05 1.00 2233 2127
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
- in the Model Info, we are informed that the total MCMC posterior sample size is 2400 and that there were 19 raw observations.
- the estimated intercept (expected presence/absence when perimeter to area ratio is 0) is 2.5. This is the mean of the posterior distribution for this parameter. If we back-transform this to the odds-ratio scale, this becomes 12.18. When the perimeter to area ratio is 0, Uta lizards are 12.18 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.14 (mean) or -0.05 (median) with a standard deviation of 0.06. The 90% credibility intervals indicate that we are 90% confident that the slope is between -0.14 and 1 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.87
- Rhat and number of effective samples for each parameter are also provided as MCMC diagnostics and all look good.
polis.brm3$fit |>
tidyMCMC(
estimate.method = "median", conf.int = TRUE,
conf.method = "HPDinterval", rhat = TRUE, ess = TRUE
)# A tibble: 6 × 7
term estimate std.error conf.low conf.high rhat ess
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 b_Intercept 2.50 1.02 0.605 4.54 1.00 2134
2 b_RATIO -0.143 0.0550 -0.250 -0.0422 1.00 2271
3 Intercept -0.180 0.536 -1.19 0.953 1.00 2279
4 prior_Intercept 0.0134 1.01 -1.97 1.95 1.000 2407
5 prior_b -0.000558 0.0993 -0.191 0.199 0.999 2432
6 lprior -0.869 0.941 -2.69 0.428 1.00 2287
Conclusions:
- the estimated intercept (expected presence/absence when perimeter to area ratio is 0) is 2.5. This is the median of the posterior distribution for this parameter. If we back-transform this to the odds-ratio scale, this becomes 12.18. When the perimeter to area ratio is 0, Uta lizards are 12.18 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.14 (median) with a standard error of 0.06. The 95% credibility intervals indicate that we are 95% confident that the slope is between -0.25 and -0.04 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.87
- Rhat and number of effective samples for each parameter are also provided and all look good.
# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 2.47 0.605 4.54 1.00 2400 2166. 2251.
2 b_RATIO -0.139 -0.250 -0.0422 1.00 2400 2233. 2127.
3 Intercept -0.177 -1.19 0.953 1.00 2400 2303. 2383.
4 prior_Intercept 0.00672 -1.97 1.95 1.00 2400 2412. 2356.
5 prior_b -0.00275 -0.191 0.199 1.000 2400 2441. 2355.
6 lprior -0.658 -2.69 0.428 1.00 2400 2255. 2094.
7 lp__ -8.73 -11.0 -8.02 1.00 2400 2419. 2272.
We can also alter the CI level.
polis.brm3 |>
summarise_draws(
median,
~ HDInterval::hdi(.x, credMass = 0.9),
rhat, length, ess_bulk, ess_tail
)# A tibble: 7 × 8
variable median lower upper rhat length ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 2.47 0.760 4.07 1.00 2400 2166. 2251.
2 b_RATIO -0.139 -0.234 -0.0555 1.00 2400 2233. 2127.
3 Intercept -0.177 -1.08 0.674 1.00 2400 2303. 2383.
4 prior_Intercept 0.00672 -1.60 1.69 1.00 2400 2412. 2356.
5 prior_b -0.00275 -0.166 0.161 1.000 2400 2441. 2355.
6 lprior -0.658 -2.15 0.422 1.00 2400 2255. 2094.
7 lp__ -8.73 -10.3 -8.02 1.00 2400 2419. 2272.
To narrow down to just the parameters of interest, see the code under the tidy_draws tab.
# A draws_df: 800 iterations, 3 chains, and 7 variables
b_Intercept b_RATIO Intercept prior_Intercept prior_b lprior lp__
1 1.75 -0.078 0.2968 -1.26 0.024 0.120 -8.7
2 1.69 -0.111 -0.3847 -1.95 -0.050 -0.220 -8.3
3 5.31 -0.281 0.0443 0.93 0.133 -3.490 -11.3
4 3.22 -0.171 0.0097 0.14 -0.087 -1.001 -8.4
5 3.29 -0.195 -0.3687 -0.83 -0.108 -1.505 -8.7
6 1.45 -0.046 0.5809 0.95 -0.062 0.189 -10.2
7 2.72 -0.174 -0.5402 0.92 0.160 -1.193 -8.5
8 2.48 -0.072 1.1307 0.24 0.027 -0.434 -11.5
9 1.77 -0.102 -0.1363 0.79 0.134 -0.060 -8.2
10 0.28 -0.058 -0.8060 -0.29 0.023 -0.028 -10.8
# ... with 2390 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
## summarised
polis.brm3 |>
as_draws_df() |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)# A tibble: 7 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 2.47 0.605 4.54 1.00 2166.
2 b_RATIO -0.139 -0.250 -0.0422 1.00 2233.
3 Intercept -0.177 -1.19 0.953 1.00 2303.
4 prior_Intercept 0.00672 -1.97 1.95 1.00 2412.
5 prior_b -0.00275 -0.191 0.199 1.000 2441.
6 lprior -0.658 -2.69 0.428 1.00 2255.
7 lp__ -8.73 -11.0 -8.02 1.00 2419.
## summarised on fractional scale
polis.brm3 |>
as_draws_df() |>
dplyr::select(starts_with("b_")) |>
mutate(across(everything(), exp)) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 11.8 0.672 71.0 1.000 2139.
2 b_RATIO 0.870 0.775 0.954 1.00 2215.
polis.draw <- polis.brm3 |> gather_draws(b_Intercept, b_RATIO)
## OR via regex
polis.draw <- polis.brm3 |> gather_draws(`b_.*`, regex = TRUE)
polis.draw# A tibble: 4,800 × 5
# Groups: .variable [2]
.chain .iteration .draw .variable .value
<int> <int> <int> <chr> <dbl>
1 1 1 1 b_Intercept 1.75
2 1 2 2 b_Intercept 1.69
3 1 3 3 b_Intercept 5.31
4 1 4 4 b_Intercept 3.22
5 1 5 5 b_Intercept 3.29
6 1 6 6 b_Intercept 1.45
7 1 7 7 b_Intercept 2.72
8 1 8 8 b_Intercept 2.48
9 1 9 9 b_Intercept 1.77
10 1 10 10 b_Intercept 0.280
# ℹ 4,790 more rows
We can then summarise this
# A tibble: 2 × 7
.variable .value .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 b_Intercept 2.47 0.538 4.50 0.95 median hdci
2 b_RATIO -0.139 -0.250 -0.0422 0.95 median hdci
polis.brm3 |>
gather_draws(b_Intercept, b_RATIO) |>
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")polis.draw |>
ggplot() +
stat_halfeye(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
)) +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE) +
facet_wrap(~.variable, scales = "free") +
theme_bw()Warning: `stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95), labels =
scales::percent_format()))` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(ggdist::cut_cdf_qi(cdf, .width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()))` instead.
We could alternatively express the parameters on the odds (odds ratio) scale.
polis.brm3 |>
gather_draws(b_Intercept, b_RATIO) |>
group_by(.variable) |>
mutate(.value = exp(.value)) |>
median_hdci()# A tibble: 2 × 7
.variable .value .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 b_Intercept 11.8 0.672 71.0 0.95 median hdci
2 b_RATIO 0.870 0.775 0.954 0.95 median hdci
Conclusions:
- the estimated intercept (expected presence/absence of Uta lizards when the perimeter to area ratio of an island is 0) is 2.47. This is the median of the posterior distribution for this parameter. When the perimeter to area ratio is 0, Uta lizards are 11.79 times more likely to be present than absent. We can also back-transform to the probability scale. When the perimeter to area ratio is 0, the probability of Uta lizards being present is 0.92
- the estimated slope (rate at which presence/absence changes per 1 unit change in perimeter to area ratio), is -0.14 (mean) or (median). The 95% credibility intervals indicate that we are 95% confident that the slope is between -0.25 and -0.04 - e.g. there is a significant positive trend. When back-transformed onto the odds scale, we see that for a one unit increase in perimeter to area ratio, the odds of Uta lizards being present declines by 0.87
This is purely a graphical depiction on the posteriors.
# A tibble: 2,400 × 16
.chain .iteration .draw b_Intercept b_RATIO Intercept prior_Intercept prior_b
<int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1.75 -0.0776 0.297 -1.26 0.0236
2 1 2 2 1.69 -0.111 -0.385 -1.95 -0.0498
3 1 3 3 5.31 -0.281 0.0443 0.928 0.133
4 1 4 4 3.22 -0.171 0.00971 0.141 -0.0872
5 1 5 5 3.29 -0.195 -0.369 -0.828 -0.108
6 1 6 6 1.45 -0.0463 0.581 0.952 -0.0623
7 1 7 7 2.72 -0.174 -0.540 0.923 0.160
8 1 8 8 2.48 -0.0721 1.13 0.237 0.0275
9 1 9 9 1.77 -0.102 -0.136 0.786 0.134
10 1 10 10 0.280 -0.0580 -0.806 -0.292 0.0230
# ℹ 2,390 more rows
# ℹ 8 more variables: lprior <dbl>, lp__ <dbl>, accept_stat__ <dbl>,
# stepsize__ <dbl>, treedepth__ <dbl>, n_leapfrog__ <dbl>, divergent__ <dbl>,
# energy__ <dbl>
# A tibble: 2,400 × 5
.chain .iteration .draw b_Intercept b_RATIO
<int> <int> <int> <dbl> <dbl>
1 1 1 1 1.75 -0.0776
2 1 2 2 1.69 -0.111
3 1 3 3 5.31 -0.281
4 1 4 4 3.22 -0.171
5 1 5 5 3.29 -0.195
6 1 6 6 1.45 -0.0463
7 1 7 7 2.72 -0.174
8 1 8 8 2.48 -0.0721
9 1 9 9 1.77 -0.102
10 1 10 10 0.280 -0.0580
# ℹ 2,390 more rows
# A tibble: 2,400 × 5
.chain .iteration .draw b_Intercept b_RATIO
<int> <int> <int> <dbl> <dbl>
1 1 1 1 1.75 -0.0776
2 1 2 2 1.69 -0.111
3 1 3 3 5.31 -0.281
4 1 4 4 3.22 -0.171
5 1 5 5 3.29 -0.195
6 1 6 6 1.45 -0.0463
7 1 7 7 2.72 -0.174
8 1 8 8 2.48 -0.0721
9 1 9 9 1.77 -0.102
10 1 10 10 0.280 -0.0580
# ℹ 2,390 more rows
## summarised
polis.brm3 |>
as_draws_df() |>
dplyr::select(starts_with("b_")) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 2.47 0.605 4.54 1.000 2139.
2 b_RATIO -0.139 -0.250 -0.0422 1.00 2215.
## summarised on fractional scale
polis.brm3 |>
as_draws_df() |>
dplyr::select(starts_with("b_")) |>
mutate(across(everything(), exp)) |>
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)Warning: Dropping 'draws_df' class as required metadata was removed.
# A tibble: 2 × 6
variable median lower upper rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 11.8 0.672 71.0 1.000 2139.
2 b_RATIO 0.870 0.775 0.954 1.00 2215.
polis.brm3 |>
tidy_draws() |>
exp() |>
summarise_draws(median, HDInterval::hdi, rhat, ess_bulk, ess_tail) |>
filter(variable %in% c("b_Intercept", "b_RATIO"))# A tibble: 2 × 7
variable median lower upper rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 11.8 0.672 71.0 1.00 2166. 2251.
2 b_RATIO 0.870 0.775 0.954 1.00 2233. 2127.
polis.brm3 |>
tidy_draws() |>
exp() |>
dplyr::select(starts_with("b_")) |>
summarise_draws(median, HDInterval::hdi, rhat, ess_bulk, ess_tail)# A tibble: 2 × 7
variable median lower upper rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 b_Intercept 11.8 0.672 71.0 1.000 2138. 2234.
2 b_RATIO 0.870 0.775 0.954 1.00 2214. 2105.
The above is on a odd ratio scale.
Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
recommended alternatives.
# A tibble: 2,400 × 7
b_Intercept b_RATIO Intercept prior_Intercept prior_b lprior lp__
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.75 -0.0776 0.297 -1.26 0.0236 0.120 -8.74
2 1.69 -0.111 -0.385 -1.95 -0.0498 -0.220 -8.26
3 5.31 -0.281 0.0443 0.928 0.133 -3.49 -11.3
4 3.22 -0.171 0.00971 0.141 -0.0872 -1.00 -8.41
5 3.29 -0.195 -0.369 -0.828 -0.108 -1.50 -8.65
6 1.45 -0.0463 0.581 0.952 -0.0623 0.189 -10.2
7 2.72 -0.174 -0.540 0.923 0.160 -1.19 -8.47
8 2.48 -0.0721 1.13 0.237 0.0275 -0.434 -11.5
9 1.77 -0.102 -0.136 0.786 0.134 -0.0600 -8.16
10 0.280 -0.0580 -0.806 -0.292 0.0230 -0.0280 -10.8
# ℹ 2,390 more rows
Warning:
`modelsummary` uses the `performance` package to extract goodness-of-fit
statistics from models of this class. You can specify the statistics you wish
to compute by supplying a `metrics` argument to `modelsummary`, which will then
push it forward to `performance`. Acceptable values are: "all", "common",
"none", or a character vector of metrics names. For example: `modelsummary(mod,
metrics = c("RMSE", "R2")` Note that some metrics are computationally
expensive. See `?performance::performance` for details.
This warning appears once per session.
| (1) | |||
|---|---|---|---|
| Est. | 2.5 % | 97.5 % | |
| b_Intercept | 2.467 | 0.650 | 4.614 |
| b_RATIO | -0.139 | -0.255 | -0.047 |
| Num.Obs. | 19 | ||
| R2 | 0.404 | ||
| ELPD | -8.7 | ||
| ELPD s.e. | 1.9 | ||
| LOOIC | 17.5 | ||
| LOOIC s.e. | 3.8 | ||
| WAIC | 17.4 | ||
| RMSE | 0.36 | ||
polis.brm3 |> modelsummary(
statistic = c("conf.low", "conf.high"),
shape = term ~ statistic,
exponentiate = TRUE
)| (1) | |||
|---|---|---|---|
| Est. | 2.5 % | 97.5 % | |
| b_Intercept | 11.790 | 1.915 | 100.863 |
| b_RATIO | 0.870 | 0.775 | 0.954 |
| Num.Obs. | 19 | ||
| R2 | 0.404 | ||
| ELPD | -8.7 | ||
| ELPD s.e. | 1.9 | ||
| LOOIC | 17.5 | ||
| LOOIC s.e. | 3.8 | ||
| WAIC | 17.4 | ||
| RMSE | 0.36 | ||
modelsummary(list("Raw" = polis.brm3, "Exponentiated" = polis.brm3),
statistic = c("conf.low", "conf.high"),
shape = term ~ statistic,
exponentiate = c(FALSE, TRUE)
)| Raw | Exponentiated | |||||
|---|---|---|---|---|---|---|
| Est. | 2.5 % | 97.5 % | Est. | 2.5 % | 97.5 % | |
| b_Intercept | 2.467 | 0.650 | 4.614 | 11.790 | 1.915 | 100.863 |
| b_RATIO | -0.139 | -0.255 | -0.047 | 0.870 | 0.775 | 0.954 |
| Num.Obs. | 19 | 19 | ||||
| R2 | 0.404 | 0.404 | ||||
| ELPD | -8.7 | -8.7 | ||||
| ELPD s.e. | 1.9 | 1.9 | ||||
| LOOIC | 17.5 | 17.5 | ||||
| LOOIC s.e. | 3.8 | 3.8 | ||||
| WAIC | 17.4 | 17.4 | ||||
| RMSE | 0.36 | 0.36 | ||||
10 Further analyses
11 Summary figure
## Using emmeans
polis.grid <- with(polis, list(RATIO = seq(min(RATIO), max(RATIO), len = 100)))
newdata <- emmeans(polis.rstanarm3, ~RATIO, at = polis.grid, type = "response") |> as.data.frame()
head(newdata) RATIO prob lower.HPD upper.HPD
0.210000 0.9184521 0.7084722 0.9989927
0.845859 0.9111536 0.7021754 0.9987613
1.481717 0.9034610 0.6901894 0.9984769
2.117576 0.8949086 0.6784342 0.9981273
2.753434 0.8857940 0.6659951 0.9976977
3.389293 0.8759157 0.6522952 0.9946253
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
ggplot(newdata, aes(y = prob, x = RATIO)) +
geom_point(data = polis, aes(y = PA)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous("PA") +
scale_x_continuous("RATIO") +
theme_classic()## spaghetti plot
newdata <- emmeans(polis.rstanarm3, ~RATIO, at = polis.grid) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value))
newdata |> head()# A tibble: 6 × 5
# Groups: RATIO [1]
RATIO .chain .iteration .draw .value
<dbl> <int> <int> <int> <dbl>
1 0.21 NA NA 1 0.858
2 0.21 NA NA 2 0.913
3 0.21 NA NA 3 0.929
4 0.21 NA NA 4 0.988
5 0.21 NA NA 5 0.847
6 0.21 NA NA 6 0.986
ggplot(newdata, aes(y = .value, x = RATIO)) +
geom_line(aes(group = .draw), alpha = 0.01) +
geom_point(data = polis, aes(y = PA))## Using emmeans
polis.grid <- with(polis, list(RATIO = seq(min(RATIO), max(RATIO), len = 100)))
newdata <- emmeans(polis.rstanarm3, ~RATIO, at = polis.grid, type = "response") |> as.data.frame()
head(newdata) RATIO prob lower.HPD upper.HPD
0.210000 0.9184521 0.7084722 0.9989927
0.845859 0.9111536 0.7021754 0.9987613
1.481717 0.9034610 0.6901894 0.9984769
2.117576 0.8949086 0.6784342 0.9981273
2.753434 0.8857940 0.6659951 0.9976977
3.389293 0.8759157 0.6522952 0.9946253
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
ggplot(newdata, aes(y = prob, x = RATIO)) +
geom_point(data = polis, aes(y = PA)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous(expression(Presence / absence ~ of ~ italic(Uta) ~ lizards)) +
scale_x_continuous(expression(Island ~ perimeter:Area ~ ratio)) +
theme_classic()## spaghetti plot
newdata <- emmeans(polis.rstanarm3, ~RATIO, at = polis.grid) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value))
newdata |> head()# A tibble: 6 × 5
# Groups: RATIO [1]
RATIO .chain .iteration .draw .value
<dbl> <int> <int> <int> <dbl>
1 0.21 NA NA 1 0.858
2 0.21 NA NA 2 0.913
3 0.21 NA NA 3 0.929
4 0.21 NA NA 4 0.988
5 0.21 NA NA 5 0.847
6 0.21 NA NA 6 0.986
ggplot(newdata, aes(y = .value, x = RATIO)) +
geom_line(aes(group = .draw), alpha = 0.01) +
geom_point(data = polis, aes(y = PA)) +
scale_y_continuous(expression(Presence / absence ~ of ~ italic(Uta) ~ lizards)) +
scale_x_continuous(expression(Island ~ perimeter:Area ~ ratio)) +
theme_classic()## Using emmeans
polis.grid <- with(polis, list(RATIO = modelr::seq_range(RATIO, n = 100)))
newdata <- polis.brm3 |>
emmeans(~RATIO, at = polis.grid, type = "response") |>
as.data.frame()Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
RATIO prob lower.HPD upper.HPD
0.210000 0.9195514 0.7095528 0.9966984
0.845859 0.9127929 0.7017424 0.9963133
1.481717 0.9057784 0.6917655 0.9953424
2.117576 0.8975684 0.6810380 0.9931453
2.753434 0.8892589 0.6704173 0.9910140
3.389293 0.8802890 0.6599640 0.9893594
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
## Using raw data for points
newdata |>
ggplot(aes(y = prob, x = RATIO)) +
geom_point(data = polis, aes(y = PA)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous("PA") +
scale_x_continuous("RATIO") +
theme_classic()## Using partial residuals for points
partial.obs <- polis |>
bind_cols(
Pred = predict(polis.brm3)[, "Estimate"],
Resid = residuals(polis.brm3)[, "Estimate"]
) |>
mutate(
Obs = round(Pred + Resid, 0)
)
newdata |>
ggplot(aes(y = prob, x = RATIO)) +
geom_point(data = partial.obs, aes(y = Obs)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous("PA") +
scale_x_continuous("RATIO") +
theme_classic()## spaghetti plot
newdata <- emmeans(polis.brm3, ~RATIO, at = polis.grid) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value))Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
# A tibble: 6 × 5
# Groups: RATIO [1]
RATIO .chain .iteration .draw .value
<dbl> <int> <int> <int> <dbl>
1 0.21 NA NA 1 0.850
2 0.21 NA NA 2 0.841
3 0.21 NA NA 3 0.995
4 0.21 NA NA 4 0.960
5 0.21 NA NA 5 0.962
6 0.21 NA NA 6 0.808
ggplot(newdata, aes(y = .value, x = RATIO)) +
geom_line(aes(group = .draw), alpha = 0.01) +
geom_point(data = polis, aes(y = PA))## Using emmeans
polis.grid <- with(polis, list(RATIO = seq(min(RATIO), max(RATIO), len = 100)))
newdata <- emmeans(polis.brm3, ~RATIO, at = polis.grid, type = "response") |> as.data.frame()Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
RATIO prob lower.HPD upper.HPD
0.210000 0.9195514 0.7095528 0.9966984
0.845859 0.9127929 0.7017424 0.9963133
1.481717 0.9057784 0.6917655 0.9953424
2.117576 0.8975684 0.6810380 0.9931453
2.753434 0.8892589 0.6704173 0.9910140
3.389293 0.8802890 0.6599640 0.9893594
Point estimate displayed: median
Results are back-transformed from the logit scale
HPD interval probability: 0.95
ggplot(newdata, aes(y = prob, x = RATIO)) +
geom_point(data = polis, aes(y = PA)) +
geom_line() +
geom_ribbon(aes(ymin = lower.HPD, ymax = upper.HPD), fill = "blue", alpha = 0.3) +
scale_y_continuous(expression(Presence / absence ~ of ~ italic(Uta) ~ lizards)) +
scale_x_continuous(expression(Island ~ perimeter:Area ~ ratio)) +
theme_classic()## spaghetti plot
newdata <- emmeans(polis.brm3, ~RATIO, at = polis.grid) |>
gather_emmeans_draws() |>
mutate(.value = plogis(.value))Warning: Method 'parse_bf' is deprecated. Please use 'brmsterms' instead.
# A tibble: 6 × 5
# Groups: RATIO [1]
RATIO .chain .iteration .draw .value
<dbl> <int> <int> <int> <dbl>
1 0.21 NA NA 1 0.850
2 0.21 NA NA 2 0.841
3 0.21 NA NA 3 0.995
4 0.21 NA NA 4 0.960
5 0.21 NA NA 5 0.962
6 0.21 NA NA 6 0.808
ggplot(newdata, aes(y = .value, x = RATIO)) +
geom_line(aes(group = .draw), alpha = 0.01) +
geom_point(data = polis, aes(y = PA)) +
scale_y_continuous(expression(Presence / absence ~ of ~ italic(Uta) ~ lizards)) +
scale_x_continuous(expression(Island ~ perimeter:Area ~ ratio)) +
theme_classic()